Criteria for Choosing the Number of Clusters of the Binary Latent Class Model – Simulation Analysis

Robert Kapłon

doi:https://doi.org/10.59139/ps.2010.01.5

Robert Kapłon Politechnika Wrocławska, Instytut Organizacji i Zarządzania Przegląd Statystyczny. Statistical Review, vol. 57, 2010, 1, pages: 66-84 Published online: 31 March 2010 DOI https://doi.org/10.59139/ps.2010.01.5

870 Views 47 Downloads

ARTICLE

(Polish) PDF

ABSTRACT

When using latent class analysis the number of clusters need to be known in advance. In order to decide on this, one can use information criteria. In such a case selection procedure is as follows: estimating a few models with different number of classes, computing information criteria and choosing a model for which a criterion takes the smallest value. Because there are many information criteria one need to determine which of them ought to be decisive. Unfortunately, by virtue of the differences among these criteria, their reliability alter depending on model class. Simulations confirm it as well. Taking into account the fact that simulations mainly concern finite mixtures of normal density functions, therefore in this paper we broaden research to latent class analysis.

KEYWORDS

latent class analysis, the number of clusters, information criteria, simulations

REFERENCES

[1] Agresti A., [2002], Categorical Data Analysis, Wiley-Interscience Publication.

[2] Akaike H., [1973], Information theory and an extension of the maximum likelihood principle, [w:] Petrov B.N., Csaki F. (eds.), Second international symposium on information theory (pp.), Budapest: Academiai Kiado, s. 267-281.

[3] Andrews L., Currim I.S., [2003], A Comparison of segment retention criteria for finite mixture logit models, „Journal of Marketing Research”, 40(2), s. 235-243.

[4] Biernacki C., Celeux G., Govaert G., [1999], An Improvement of the NEC Criterion for Assessing the Number of Clusters in a Mixture Model, Pattern Recognition Letters, 20 (3), s. 267-272.

[5] Biernacki C., Celeux G., Govaert G., [2000], Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (7), 719-725.

[6] Biernacki C., Govaert G., [1999], Choosing Models in Model-based Clustering and Discriminant Analysis, „Journal of Statistical Computation and Simulation”, 64, 49-71.

[7] Bozdogan H., [1987], Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions, Psychometrika, 52, s. 345-370.

[8] Bozdogan H., [1988], ICOMP: A new model-selection criterion, [w:] Bock H., (eds.), Classification and related methods of data analysis, s. 599-608, Amsterdam, Elsevier Science (North-Holland).

[9] Bozdogan H., [2000], Akaike’s Information Criterion and Recent Developments in Information Complexity, „Journal of Mathematical Psychology”, 44, s. 62-91.

[10] Bozdogan H., [1990], On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models, Communications in statistics theory and methods, 19, s. 221-278.

[11] Bozdogan H., [1993], Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-Fisher information matrix. [w:] Opitz O., Lausen B., Klar R. (eds.), Information and Classification. Springer, Heidelberg, s. 40-54.

[12] Burnham K.P, Anderson D., [2002], Model Selection and Multi-Model Inference, Springer.

[13] Celeux G., Soromenho G., [1996], An Entropy Criterion for Assessing the Number of Clusters in a Mixture Model, Classification Journal, 13, s.195-212.

[14] Dempster A.P., Laird N.M., Rubin D.B., [1977], Maximum likelihood from incomplete data via the EM algorithm, „Journal of the Royal Statistical Society”, Ser. B, No. 1(39), s. 1-22.

[15] Goodman L.A., [1974], Exploratory Latent Structure Analysis Using Both Identifiable And Unidentifiable Models, „Biometrika” 61, s. 215-231.

[16] Heinen T., [1996], Latent Class And Discrete Latent Trait Models: Similarities And Differences, Thousand Oaks, California: Sage.

[17] Kapłon R., [2002], Analiza danych dyskretnych za pomocą metody LCA, „Taksonomia 9”, Prace Naukowe AE we Wrocławiu.

[18] Kass R.E., Raftery A.E., [1995], Bayes factors, „Journal of the American Statistical Association”, 90, s. 773-795.

[19] Konishi S., Kitagawa G., [1996], Generalized information criteria in model selection, Biometrika, 83, s. 875-890.

[20] Konisi S., Kitagawa G., [2003], Asymptotic theory for information criteria In model selection-functional approach, „Journal of Statistical Planning and Inference”, 114, s. 45-61.

[21] Konisi S., Kitagawa G., [2008], Information Criteria and Statistical Modeling, Springer.

[22] Kullback S., [1997], Information Theory and Statistics, Dover Publications.

[23] McCutcheon A.L., [1987], Latent Class Analysis, Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-064. Thousand Oaks, CA: Sage.

[24] McLachlan G.J., Peel. D., [2000], Finite Mixture Models, New York, Wiley.

[25] McLachlan G.J., [1987], On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture, Journal of the Royal Statistical Society Series C (Applied Statistics), 36, s. 318-324.

[26] R Development Core Team, [2009], R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org

[27] Raftery A.E., [1999], Bayes factors and BIC – Comment on “A critique of the Bayesian information criterion for model selection, Sociological Methods and Research, 27, s. 411-427.

[28] Takeuchi K., [1976], Distribution of information statistics and a criterion of model fitting, Mathematical Sciences, 153, s. 12-18, (In Japanese).

[29] Weakliem D.L., [1999], A Critique of the Bayesian Information Criterion for Model Selection, Sociological Methods and Research, 27, s. 359-397.

[30] Wolfe J.H., [1971], A Monte Carlo study of the sampling distribution of the likelihood ratio for mixtures of multinomial distributions, Technical Bulletin STB 72-2, US Naval Personnel and Training Research Laboratory, San Diego.